55 research outputs found
Deterministic 1-k routing on meshes with applications to worm-hole routing
In - routing each of the processing units of an mesh connected computer initially holds packet which must be routed such that any processor is the destination of at most packets. This problem reflects practical desire for routing better than the popular routing of permutations. - routing also has implications for hot-potato worm-hole routing, which is of great importance for real world systems. We present a near-optimal deterministic algorithm running in \sqrt{k} \cdot n / 2 + \go{n} steps. We give a second algorithm with slightly worse routing time but working queue size three. Applying this algorithm considerably reduces the routing time of hot-potato worm-hole routing. Non-trivial extensions are given to the general - routing problem and for routing on higher dimensional meshes. Finally we show that - routing can be performed in \go{k \cdot n} steps with working queue size four. Hereby the hot-potato worm-hole routing problem can be solved in \go{k^{3/2} \cdot n} steps
Sample sort on meshes
This paper provides an overview of lower and upper bounds for mesh-connected processor networks. Most attention goes to routing and sorting problems, but other problems are mentioned as well. Results from 1977 to 1995 are covered. We provide numerous results, references and open problems. The text is completed with an index. This is a worked-out version of the author's contribution to a joint paper with Grammatikakis, Hsu and Kraetzl on multicomputer routing, submitted to JPDC
A powerful heuristic for telephone gossiping
A refined heuristic for computing schedules for gossiping in the telephone model is presented. The heuristic is fast: for a network with n nodes and m edges, requiring R rounds for gossiping, the running time is O(R n log(n) m) for all tested classes of graphs. This moderate time consumption allows to compute gossiping schedules for networks with more than 10,000 PUs and 100,000 connections. The heuristic is good: in practice the computed schedules never exceed the optimum by more than a few rounds. The heuristic is versatile: it can also be used for broadcasting and more general information dispersion patterns. It can handle both the unit-cost and the linear-cost model. Actually, the heuristic is so good, that for CCC, shuffle-exchange, butterfly de Bruijn, star and pancake networks the constructed gossiping schedules are better than the best theoretically derived ones. For example, for gossiping on a shuffle-exchange network with 2^{13} PUs, the former upper bound was 49 rounds, while our heuristic finds a schedule requiring 31 rounds. Also for broadcasting the heuristic improves on many formerly known results. A second heuristic, works even better for CCC, butterfly, star and pancake networks. For example, with this heuristic we found that gossiping on a pancake network with 7! PUs can be performed in 15 rounds, 2 fewer than achieved by the best theoretical construction. This second heuristic is less versatile than the first, but by refined search techniques it can tackle even larger problems, the main limitation being the storage capacity. Another advantage is that the constructed schedules can be represented concisely
Towards practical permutation routing on meshes
We consider the permutation routing problem on two-dimensional meshes. To be practical, a routing algorithm is required to ensure very small queue sizes , and very low running time , not only asymptotically but particularly also for the practically important up to . With a technique inspired by a scheme of Kaklamanis/Krizanc/Rao, we obtain a near-optimal result: with . Although is very attractive now, the lower order terms in make this algorithm highly impractical. Therefore we present simple schemes which are asymptotically slower, but have around for {\em all} and between 2 and 8
Vertex labeling and routing in expanded Apollonian networks
We present a family of networks, expanded deterministic Apollonian networks,
which are a generalization of the Apollonian networks and are simultaneously
scale-free, small-world, and highly clustered. We introduce a labeling of their
vertices that allows to determine a shortest path routing between any two
vertices of the network based only on the labels.Comment: 16 pages, 2 figure
External Selection
Sequential selection has been solved in linear time by Blum e.a. Running this algorithm on a problem of size with , the size of the main-memory, results in an algorithm that reads and writes \go{N} elements, while the number of comparisons is also bounded by \go{N}. This is asymptotically optimal, but the constants are so large that in practice sorting is faster for most values of and . This paper provides the first detailed study of the external selection problem. A randomized algorithm of a conventional type is close to optimal in all respects. Our deterministic algorithm is more or less the same, but first the algorithm builds an index structure of all the elements. This effort is not wasted: the index structure allows the retrieval of elements so that we do not need a second scan through all the data. This index structure can also be used for repeated selections, and can be extended over time. For a problem of size , the deterministic algorithm reads elements and writes only elements and is thereby optimal to within lower-order terms
Ultimate Parallel List Ranking?
Two improved list-ranking algorithms are presented. The ``peeling-off'' algorithm leads to an optimal PRAM algorithm, but was designed with application on a real parallel machine in mind. It is simpler than earlier algorithms, and in a range of problem sizes, where previously several algorithms where required for the best performance, now this single algorithm suffices. If the problem size is much larger than the number of available processors, then the ``sparse-ruling-sets'' algorithm is even better. In previous versions this algorithm had very restricted practical application because of the large number of communication rounds it was performing. This weakness is overcome by adding two new ideas, each of which reduces the number of communication rounds by a factor of two
Better Trade-offs for Parallel List Ranking
An earlier parallel list ranking algorithm performs well for problem sizes that are extremely large in comparison to the number of PUs . However, no existing algorithm gives good performance for reasonable loads. We present a novel family of algorithms, that achieve a better trade-off between the number of start-ups and the routing volume. We have implemented them on an Intel Paragon, and they turn out to considerably outperform all earlier algorithms: with the sequential algorithm is already beaten for N = \mbox{25,000}; for and , the speed-up is 21, and for it even reaches 30. A modification of one of our algorithms solves a theoretical question: we show that on one-dimensional processor arrays, list ranking can be solved with a number of steps equal to the diameter of the network
From parallel to external list ranking
Novel algorithms are presented for parallel and external memory list-ranking. The same algorithms can be used for computing basic tree functions, such as the depth of a node. The parallel algorithm stands out through its low memory use, its simplicity and its performance. For a large range of problem sizes, it is almost as fast as the fastest previous algorithms. On a Paragon with 100 PUs, each holding 10^6 nodes, we obtain speed-up 25. For external-memory list-ranking, the best algorithm so far is an optimized version of independent-set-removal. Actually, this algorithm is not good at all: for a list of length N, the paging volume is about 72 N. Our new algorithm reduces this to 18 N. The algorithm has been implemented, and the theoretical results are confirmed
- …